Stock Market Trading via Stochastic Network 

Optimization 

Michael J. Neely 
University of Southern California 
http://www-rcf.usc.edu/~mjneely 



Abstract — We consider the problem of dynamic buying and 
selling of shares from a collection of N stocks with random 
price fluctuations. To limit investment risk, we place an upper 
bound on the total number of shares kept at any time. Assuming 
that prices evolve according to an ergodic process with a mild 
decaying memory property, and assuming constraints on the total 
number of shares that can be bought and sold at any time, 
we develop a trading policy that comes arbitrarily close to 
achieving the profit of an ideal policy that has perfect knowledge 
of future events. Proximity to the optimal profit comes with a 
corresponding tradeoff in the maximum required stock level 
and in the timescales associated with convergence. We then 
consider arbitrary (possibly non-ergodic) price processes, and 
show that the same algorithm comes close to the profit of a 
frame based policy that can look a fixed number of slots into the 
future. Our analysis uses techniques of Lyapunov Optimization 
that we originally developed for stochastic network optimization 
problems. 

Index Terms — Queueing analysis, stochastic control, universal 
algorithms 

I. Introduction 

This paper considers the problem of stock trading in an 
' economic market with N stocks. We treat the problem in 
discrete time with normalized time slots t G {0,1,2,...}, 
where buying and selling transactions are conducted on each 
slot. Let Q{t) = {Qi{t), . . . , QN{t)) be a vector of the current 
number of shares owned of each stock, called the stock queue. 
That is, for each n e {1, . . . , N}, the value of Qn{t) is an 
integer that represents the number of shares of stock n. Stock 
. prices are given by a vector p{t) — {pi{t), . . . ,pN{t)) and 
' are assumed to evolve randomly, with mild assumptions to be 
made precise in later sections. Each buy and sell transaction 
incurs trading costs. Stocks can be sold and purchased on every 
slot. Let (j){t) represent the net profit on slot t (after transaction 
costs are paid). The goal is to design a trading policy that 
maximizes the long term time average of (f){t). 

For this system model, we enforce the additional constraint 
that at most /i™"^ shares of each stock n can be bought and 
sold on a given slot. This ensures that our trading decisions 
only gradually change the portfolio allocation. While this 
Ij-max constraint can significantly limit the ability to take 
advantage of desirable prices, and hence limits the maximum 
possible long term profit, we show that it can also reduce 
investment risk. Specifically, subject to the constraint. 
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we develop an algorithm that achieves a time average profit 
that is arbitrarily close to optimal, with a tradeoff in the 
maximum number of shares required for stock n. The 

Qmax values Can be chosen as desired to limit the losses 
from a potential collapse of one or more of the stocks. It 
also impacts the timescales over which profit is accumulated, 
where smaller Q™""^ levels lead to faster convergence times. 

It is important to note that long term wealth typically 
grows exponentially when the Q™"^ and /i™"^ constraints are 
removed. In contrast, it can be shown that these Q™"^ and 
^maa; constraints restrict wealth to at most a linear growth. 
Therefore, using and to limit investment risk 

unfortunately has a dramatic impact on the long term growth 
curve. However, our ability to bound the timescales over which 
wealth is earned suggests that our strategy may be useful 
in cases when, in addition to a good long-term return, we 
also desire noticeable and consistent short-term gains. At the 
end of this paper, we briefly describe a modified strategy 
that increases Q™"^ and /x™"^ as wealth progresses, with the 
goal of achieving noticeable short-term gains while enabling 
exponential wealth increase. 

Our approach uses the Lyapunov optimization theory devel- 
oped for stochastic queueing networks in our previous work 
[1][2][3]. Specifically, the work [1][2][3] develops resource 
allocation and scheduling policies for communication and 
queueing networks with random traffic and channels. The 
policies can maximize time average throughput-utility and 
minimize time average power expenditure, as well as optimize 
more general time average attributes, without a-priori knowl- 
edge of the traffic and channel probabilities. The algorithms 
continuously adapt to emerging conditions, and are robust 
to non-ergodic changes in the probability distributions [4]. 
This suggests that similar control techniques can be used 
successfully for stock trading problems. The difference is that 
the queues associated with stock shares are controlled to have 
positive drift (pushing them towards the maximum queue size), 
rather than negative drift (which would push them in the 
direction of the empty state). 

The Dynamic Trading Algorithm that we develop from these 
techniques can be intuitively viewed as a variation on a theme 
of dollar cost averaging, where price downturns are exploited 
by purchasing more stock. However, the actual amount of 
stock that we buy and sell on each slot is determined by 
a constrained optimization of a max-weight functional that 
incorporates transaction costs, current prices, and current stock 
queue levels. 
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Much prior work on financial analysis and portfolio opti- 
mization assumes a known probability model for stock prices. 
Classical portfolio optimization techniques by Markowitz [5] 
and Sharpe [6] construct portfolio allocations over N stocks to 
maximize profit subject to variance constraints (which model 
risk) over one investment period (see also [7] and references 
therein). Solutions to this problem can be calculated if the 
mean and covariance of stock returns are known. Samuel- 
son considers multi-period problems in [8] using dynamic 
programming, assuming a known product form distribution 
for investment returns. Cover in [9] develops an iterative 
procedure that converges to the constant portfolio allocation 
that maximizes the expected log investment return, assuming a 
known probability distribution that is the same on each period. 
Recent work by Rudoy and Rohrs in [10] [11] considers risk- 
aware optimization with a more complex cointegrated vector 
autorgressive assumption on stock processes, and uses Monte 
Carlo simulations over historical stock trajectories to inform 
stochastic decisions. Stochastic models of stock prices using 
Levy processes and multi-fractal processes are considered in 
[7] [12] [13] and references therein. 

A significant departure from this work is the universal stock 
trading paradigm, as exemplified in prior works of Cover and 
Gluss [14], Larson [15], Cover [16], Merhav and Feder [17], 
and Cover and Ordentlich [18] [19], where trading algorithms 
are developed and shown to provide analytical guarantees for 
any sample path of stock prices. Specifically, the work in 
[14]-[19] seeks to find a non-anticipating trading algorithm 
that yields the same growth exponent as the best constant 
portfolio allocation, where the constant can be optimized with 
full knowledge of the future. The works in [14] [15] develop 
algorithms that come close to the optimal exponent, and the 
work in [16] achieves the optimal exponent under a mild active 
stock assumption on the price sample paths. Similar results 
are derived in [17] using a general framework of sequential 
decision theory. Related results are derived in [18] [19] without 
the active stock assumption, where [19] also treats max-min 
performance when stock prices are chosen by an adversary. 

Our work is similar in spirit to this universal trading 
paradigm, in that we do not base decisions on a known (or 
estimated) probability distribution. However, our context and 
solution methodology is very different. Indeed, the works in 
[14]-[19] assume that the entire stock portfolio can be sold and 
reallocated on every time period, and allow stock holdings to 
grow arbitrarily large. This means that the accumulated profit 
is always at risk of one or more stock failures. In our work, we 
take a more conservative approach that restricts reallocation 
to gradual changes, and that pockets profits while holding no 
more than shares of each stock n. We also explicitly 

account for trading costs and integer constraints on stock 
shares, which is not considered in the works [14]-[19]. In this 
context, we first design an algorithm under the assumption that 
prices are ergodic with an unknown distribution. In this case, 
we develop a simple non-anticipating algorithm that comes 
arbitrarily close to the optimal time average profit that could 
be earned by an ideal policy with complete knowledge of 
the future. The ideal policy used for comparison can make 
different allocations at different times, and is not restricted to 



constant allocations as considered in [14]-[19]. We then show 
that the same algorithm can be used for general price sample 
paths, even non-ergodic sample paths without well defined 
time averages. A more conservative guarantee is shown in this 
case: The algorithm yields profit that is arbitrarily close to 
that of a frame based policy with "T-slot lookahead," where 
the future is known up to T slots. Our approach is inspired 
by Lyapunov optimization and decision theory for stochastic 
queueing networks [1]. However, the Lyapunov theory we use 
here involves sample path techniques that are different from 
those in [1]. These techniques might have broader impacts on 
queueing problems in other areas. 

In the next section we present the system model. In Section 
Uni we develop the Dynamic Trading Algorithm and analyze 
performance for the simple (and possibly unrealistic) case 
when price vectors p{t) are ergodic and i.i.d. over slots. While 
this i.i.d. case does not accurately model actual stock prices, 
its analysis provides valuable insight. Section HV] expands the 
analysis to show the same algorithm can handle more general 
ergodic processes with a mild decaying memory property. 
Section |V] shows the algorithm also provides performance 
guarantees for completely arbitrary price processes (possibly 
non-ergodic). A simple enhancement that reduces startup cost 
is treated in Section IVII and Section IVIII briefly considers 
an extension that allows for exponential wealth increase by 
gradually scaling the and Q™"^ parameters. 

II. System Model 

Let A{t) — . . . , ^Ar(t)) be a vector of decision 

variables representing the number of new shares purchased 
for each stock on slot t, and let = {fii{t), . . . ,pN{t)) 
be a vector representing the number of shares sold on slot 
t. The values An{t) and are non-negative integers for 

each n G {1,...,N}. Each purchase of A new shares of 
stock n incurs a transaction cost bn{A) (called the buying cost 
function). Likewise, each sale of /i shares of stock n incurs a 
transaction cost Sn(A*) (called the selling cost function). The 
functions 6„(A) and Sn(/^) are arbitrary, and are assumed only 
to satisfy 6„(0) = s„(0) = 0, and to be non-negative, non- 
decreasing, and bounded by finite constants 6™"^ and s™*^^, 
so that: 

< bn{A) < 6™°^ for < A < 

< snifi) < 4"""= for < At < l^T"' 

where for each n e {1, . . . , N}, is a positive integer 

that limits the amount of shares of stock n that can be bought 
and sold on slot t. 

A. Example Transaction Cost Functions 

The functions might be linear, representing a trans- 

action fee that charges per share purchased. Another example 
is a fixed cost model with some fixed positive fee 6„, so that: 

, , / &„ if A > 
''"^^> ~ \ if 

Similar models can be used for the s„(/i) function. The 
simplest model of all is the zero transaction cost model where 
the functions 6„(A) and s„(/i) are identically zero. 
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B. System Dynamics 

The stock price vector p{t) is assumed to be a random 
vector process that takes values in some finite set V C R^, 
where V can have an arbitrarily large number of elementsQ 
For each n, let p™"^ represent a bound on Pn{t), so that: 

< Pn{t) < p"^""^ for all t and all n e {1, . . . , N} (1) 

We assume that buying and selling decisions can be made on 
each slot t based on knowledge of p{t). The selling decision 
variables are made every slot t subject to the following 
constraints: 

M„(t) e {0,l,...,/ir"} forallne {l,...,iV} (2) 
^in{t)Pn{t) > s„(Ai„(i)) for all n e {1, . . . , N} (3) 
^^n{t) < Qn{t) for all n e {1, . . . , N} (4) 

Constraint (|2|l ensures that no more than shares can 

be sold of any stock on a single slot. Constraint (O restricts 
to the reasonable case when the money earned from the sale 
of a stock must be larger than the transaction fee associated 
with the sale (violating this constraint would clearly be sub- 
optimal)0 Constraint requires the number of shares sold 
to be less than or equal to the current number owned. 

The buying decision variables A{t) are constrained as 
follows: 

A„(t) e {0,1,2,..., ^r"} forall ne {1,...,7V} (5) 

En=i^«WPnW <a: (6) 

where x is a positive value that bounds the total amount 
of money used for purchases on slot t. For simplicity, 
we assume there is always at least a minimum of x and 
E^Lil/^n""'?'"'"' + bniy^n"'')] dollars available for making 
purchasing decisions. This model can be augmented by adding 
a checking account queue Q(){t) from which we must draw 
money to make purchases, although we omit this aspect for 
brevity. 

The resulting queueing dynamics for the stock queues Qn {t) 
for n G {1, . . . , N} are thus: 

g„(t + 1) = max[Q„(t) - ^in{t) + An{t), 0] (7) 

Strictly speaking, the max[-, 0] operator in the above dynamic 
equation is redundant, because the constraint ^ ensures that 
the argument inside the max[-,0] operator is non-negative. 
However, the max[-, 0] shall be useful for mathematical anal- 
ysis when we compare our strategy to that of a queue- 
independent strategy that neglects constraint (|4]i. 

'The cai'dinality of the set V does not enter into our analysis. We assume it 
is finite only for the convenience of claiming that the supremum time average 
profit is achievable by a single "p-only" policy, as described in Section 
III-EI Theorems [T] |2] |3] are unchanged if the set V is infinite, although the 
proofs of Theorems [T] and |2] would require an additional limiting argument 
over p-only policies that approach 0°'''. 

^Constraint ^3) can be augmented by allowing equality only if /in{t) = 0. 



C. The Maximum Profit Objective 
Define as the net profit on slot t: 

N 

m - ^K(iKW-S„(Mn(i))] 
n=l 

N 

-Y,[Mi)Pn{t)+hn{An{t))] (8) 
n=l 

Define as the time average expected value of (t>{t) under a 
given trading algorithm (temporarily assumed to have a well 
defined limit): 

^4 lim \Y.¥.{^{r)} 

T=0 

The goal is to design a trading policy that maximizes 0. It is 
clear that the trivial strategy that chooses — A(i) — for 
all t yields = for all t, and results in = 0. Therefore, 
we desire our algorithm to produce a long term profit that 
satisfies > 0. 

D. Discussion of Constraints 

If we set Y.n=i [fJ-n''''Pn'''' + bniHn then constraint 
^ is redundant and can be removed. In this case, the multi- 
stock problem completely decouples into separate problems 
of optimally trading on each of the individual stocks. Trading 
on just a single stock is itself an important problem that can 
be viewed as a special case of our system model. We add the 
constraint (|6|l for multi-stock problems as it can be used to 
limit the total amount spent on new purchases on a single slot. 
The constraint (|6]l can lead to a complex decision on each slot 
that is related to the bounded knapsack problem, as discussed 
in Section IIII-AI after the description of the Dynamic Trading 
Algorithm. The formulation can be modified by replacing the 
constraint (|6]l with the following constraint that often yields a 
simpler implementation: 

En^l Mt)<Atot (9) 

where Atot is an integer that bounds the total number of stocks 
that can be bought on a single slot. 

E. The Stochastic Price Vector and p-only Policies 

We first assume the stochastic process p{t) has well defined 
time averages (this is generalized to non-ergodic models in 
Section [V]i. Specifically, for each price vector p in the finite 
set V, we define 7r(p) as the time average fraction of time that 
p{t) = p, so that: 

1 

lim - y 1{p{t) = p} = tt{p) with probability 1 (10) 

t—^oo t ^ — ^ 

T = 

where l{p(r) = p} is an indicator function that is 1 if p(r) = 
p, and zero otherwise. 

Define a p-only policy as a buying and selling strategy 
that chooses virtual decision vectors A*{t) and fi*{t) as a 
stationary and possibly randomized function of p{t), con- 
strained only by (|2|-(l3]i and (IS])-®. That is, the virtual decision 
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vectors A*{t) and associated with a p-only poHcy do 

not necessarily satisfy the constraint (|4| that is required of 
the actual decision vectors, and hence these decisions can be 
made independently of the current stock queue levels. 

Under a given p-only policy, define the following time 
average expectations d* and 0*: 

rf:Alini^E{A;(r)-/4(T)} (11) 

t-1 ( N 
^ T=0 ln=l 

JV >! 

-^K(r)p„(r)+6„(A:(r))] (12) 

n=l J 

It is easy to see by ( fTOl i that these time averages are well 
defined for any p-only policy. For each n, the value d* 
represents the virtual drift of stock queue Qn{t) associated 
with the virtual decisions A*{t) and lJL*{t). The value (j)* 
represents the virtual profit under virtual decisions A*{t) and 
^l*{t). Note that the trivial p-only poHcy A*{t) = fj.*{t) = 
yields d* = for all n, and (j)* = 0. Thus, we can define 
^opt jjjg supremum value of cj)* over all p-only policies 
that yield d* > for all n, and we note that 0°''* > 0. Using 
an argument similar to that given in [2], it can be shown that: 

1) is achievable by a single p-only pohcy that satisfies 
d*„=0 for all ne {l,...,N}. 

2) (f)°P'^ is greater than or equal to the supremum of the 
limsup time average expectation of (j){t) that can be 
achieved over the class of all actual policies that satisfy 
the constraints dill-©, including ideal policies that use 
perfect information about the future. Thus, no policy can 
do better than (j)°P\ 

That is achievable by a single p-only policy (rather 
than by a limit of an infinite sequence of policies) can be 
shown using the assumption that the set V of all price vectors 
is finite. That 0°^* bounds the time average profit of all 
policies, including those that have perfect knowledge of the 
future, can be intuitively understood by noting that the optimal 
profit is determined only by the time averages 7r(p). These 
time averages are the same (with probability 1) regardless 
of whether or not we know the future. The detailed proofs 
of these results are similar to those in [2] and are provided 
in Appendix C. In the next section we develop a Dynamic 
Trading Algorithm that satisfies the constraints ©-(1611 and that 
does not know the future or the distribution 7r(p), yet yields 
time average profit that is arbitrarily close to 

To develop our Dynamic Trading Algorithm, we first focus 
on the simple case when the vector p{t) is independent and 
identically distributed (i.i.d.) over slots, with a general prob- 
ability distribution 7r(p). This is an overly simplified model 
and does not reflect actual stock time series data. Indeed, 
a more accurate model would be to assume the differences 
in the logarithm of prices are i.i.d. (see [7] and references 
therein). However, we show in Section |IV] that the same 
algorithm developed for the simplified i.i.d. case can also be 
used for a general class of ergodic but non-i.i.d. processes 



that have a mild decaying memory property (a property held 
by all processes that are modulated by finite state Markov 
chains). Section |V] shows the algorithm can also treat arbitrary 
(possibly non-ergodic) price models. 

F. The i.i.d. Model 

Suppose p{t) is i.i.d. over slots with Pr[p{t) = p] ~ 7r(p) 
for all p e "P. Because the value is achievable by a 
single p-only policy, and because the expected values of any 
p-only policy are the same every slot under the i.i.d. model, 
we have the following: There is a p-only policy A*{t), fJ,*{t) 
that yields for all t and all Q{t): 

E{4*(t)-<(t)|Q(i)} = (13) 

and 

-En=l[Kit)Pn{t) + 6n(A;(t))] | Q{t)} = (14) 

III. Constructing a Dynamic Trading Algorithm 

The goal is to ensure that all stock queues Qn{t) are 
maintained at reasonably high levels so that there are typically 
enough shares available to sell if an opportune price should 
arise. To this end, define 9i, . . . ,6n as positive real numbers 
that represent target queue sizes for the stock queues (soon to 
be related to the maximum queue size). The particular values 
di, . . . ,9n shall be chosen later. As a scalar measure of the 
distance each queue is away from its target value, we define 
the following Lyapunov function L{Q{t)): 

1 ^ 

L{Q{t))^-J2{Qn{t) - e,,r (15) 

ri=l 

Suppose that Q{t) evolves according to some probability law, 
and define A(Q(t)) as the one-slot conditional Lyapunov 
driftE 

A(Q(t))AE {L{Q{t + 1)) - L{Q{t)) \ Qit)} (16) 

As in the stochastic network optimization problems of 
[1][2][3], our approach is to take control actions on each slot t 
to minimize a bound on the "drift-minus-reward" expression: 

A(Q(t))-yE{0(t)| 

where is a positive parameter to be chosen as desired to 

affect the proximity to the optimal time average profit 

To this end, we first compute a bound on the Lyapunov drift. 

Lemma 1: (Lyapunov drift bound) For all t and all possible 
values of Q{t), we have: 

N 

A{Q{t)) < B-Y,{Qn{t)-On)E{fin{t)-An{t)\Q{t)} 

'Strictly speaking, proper notation is A(Q(t),t), as the drift may aiise 
from a non-stationary algorithm. However, we use the simpler notation 
A{Q{t)) as a formal representation of the right hand side of (16). 
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where _B is a finite constant that satisfies: 
1 ^ 

B>-J2^{ifi„it)-ArM^\Q{t)} (17) 

^ n=l 

Such a finite constant B exists because of the boundedness 
assumptions on buy and sell variables and An{t). In 

particular, we have: 

1 ^ 

B<-Y,{f,ri' (18) 

n=l 

Proof: See Appendix A. □ 
Using Lemma [U with the definition of 4>{t) in dD, a bound 
on the drift-minus-reward expression is given as follows: 

A{Q{t))-VE{(t>{t)\Q{t)}<B 

- Ell(QnW - ^n)E{M„(t) - AJt) I Q{t)} 
-VEn=l^Mt)Pnit) - sMt)) I Q(i)} 

+^E^LiE{A„(tK(i) + fo„(A„(i)) I Qim (19) 

We desire an algorithm that, every slot, observes the Q{t) 
values and the current prices, and makes a greedy trading 
action subject to the constraints (|2]l-(|6]l that minimizes the right 
hand side of ( fT9] l. 

A. The Dynamic Trading Algorithm 

Every slot t, observe Q{t) and p{t) and perform the 
following actions. 

1) Selling: For each n G {1, . . . , N}, choose /i„(t) to solve: 

Minimize: [0„ - Q„{t) - Vpn{t)]nn{t) + T^s„(^„(i)) 
Subject to: Constraints dU-® 

2) Buying: Choose A{t) = {Ai{t),.. . , A„(t)) to solve: 

Minimize: J2n=l [Qn{t) - On + VprMAn{i) 

Subject to: Constraints (|5]l-(|6j 

The buying algorithm uses the integer constraints (|5]l-(|6|l, 
and is related to the well known bounded knapsack problem 
(it is exactly the bounded knapsack problem if the 6„(-) func- 
tions are linear). Implementation of this integer constrained 
problem can be complex when the number of stocks N is 
large. However, if we use x=^^^-^[iJ,'^°'^p^'^^ + &„(^™'^'^)], 
then constraint (|6]l is effectively removed. In this case, the 
stocks are decoupled and the buying algorithm reduces to 
making separate decisions for each stock n. Alternatively, 
the constraint (|6]l can be replaced by the constraint (|9]l. In 
this case, it is easy to see that if buying costs are linear, so 
that bn{A) = bnA for all n (for some positive constants 6„), 
then the buying algorithm reduces to successively buying as 
much stock as possible from the queues with the smallest (and 
negative) [Qn{t) — On + V{pn{t) + &„)] values. An alternative 
relaxation of the constraint (|6]l is discussed in Section IVII-CI 



Lemma 2: For a given Q{t) on slot t, the above dynamic 
trading algorithm satisfies: 

N 

B - V4>{t) - Y.^Qn{t) - ^n)(M„W - An{t)) < 
n=l 

N 

B - Vr{t) - 5](Q„W - en){fi*nit) - Kit)) (20) 
n=l 

where A{t), fi{t) are the actual decisions made by the 
algorithm, which define by dHJ, and A*{t), fJ,*{t) are 
any alternative (possibly randomized) decisions that can be 
made on slot t that satisfy dSJl-®, which define (p*{t) by (|8]l. 
Furthermore, we have: 

A{Q{t))-VE{(t>{t)\Q{t)}<B 

-En^iiQnit) - 0„)e{m;(O - a;m I Q{t)} 

-VEn=l^{l^Ut)Pnit) - Sn{^l*n{t)) I QWI 

+^ELiE{A* (tK(i) + bn{AUt)) I Q{t)} (21) 

where the expectation on the right hand side of fTH is with 
respect to the random price vector p{t) and the possibly 
random actions A*{t), fi*{t) in response to this price vector. 

Proof: Given Q{t) on slot t, the dynamic algorithm makes 
buying and selling decisions to minimize the left hand side of 
(|20] | over all alternative decisions that satisfy (|2]i-(|6]l. There- 
fore, the inequality (l20l i holds for all realizations of the random 
quantities, and hence also holds when taking conditional 
expectations of both sides. The conditional expectation of the 
left hand side of (|20] i is equivalent to the right hand side of 
the drift-minus-reward expression (T% . which proves d^TT i. □ 

The main idea behind our analysis is that the Dynamic 
Trading Algorithm is simple to implement and does not require 
knowledge of the future or of the statistics of the price 
process p{t). However, it can be compared to alternative 
policies A*{t) and fi*{t) (such as in Lemma |2] and in other 
lemmas in Sections |IV] and |V] that consider more complex 
price processes), and these policies possibly have knowledge 
both of the price statistics and of the future. 

B. Bounding the Stock Queues 

The next lemma shows that the above algorithm does not 
sell any shares of stock n if Qn{t) is sufficiently small. 

Lemma 3: Under the above Dynamic Trading Algorithm 
and for arbitrary price processes p{t) that satisfy ([U, if 
Qn{t) < On — Vp™"^^ for some particular queue n and slot t, 
then ^in{t) = 0. Therefore, if Q„(0) > 0„ - Fp™'^^ - /^;;""^, 
then: 

Qnit)>On-Vpr^ -fi^" foralU 
Proof: Suppose that Qn{t) < On - Vp^"-"^ for some 
particular queue n and slot t. Then for any /i > we have: 

\0n Qn {t)-Vpn{t)]^l + VSn{^l) 

> [On~Qn{t)-Vpn^^ + VSni^i) 

> [On-Qn{t)~Vprll^ 

> 
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where the final inequaUty holds with equaUty if and only 
if fj. — 0. Therefore, the Dynamic Trading Algorithm must 
choose fin{t) = 0. 

Now suppose that Qn {t) > 0n ~ V"p™°^ - /i™""^ for some 
time t. We show it also holds for If Q„(i) > 6l„-Fp™°^, 
then it can decrease by at most on a single slot, so that 

Qn{t+1) > 6l„-Fp;7'^^-^™''^. Conversely, if 6l„-yp;7"^ > 
Qnit) > e„ - 1/^™"=" - then we know fi,a{t) = and 

so the queue cannot decrease on the next slot and we again 
have Qnit + 1) > dn - Vp^"'' - /i™""^. It follows that this 
inequality is always upheld if it is satisfied at t = 0. □ 

We note that the above lemma is a sample path statement 
that holds for arbitrary (possibly non-ergodic) price processes. 
The next lemma also deals with sample paths, and shows that 
all queues have a finite maximum size Q™"^. 

Lemma 4: Under the above Dynamic Trading Algorithm 
and for arbitrary price processes pit) that satisfy ([T]i, if 
Qnit) > On for some particular queue n and slot t, then 
An{t) = and so the queue cannot increase on the next slot. 
It follows that if Q„(0) < 9n + m"""', then: 

Qnit) < 9n + liT" for alii 
Proof: Suppose that Qnit) > On for a particular queue 
n and slot t. Let Ait) = (^i(i), . . . ,AN{t)) be a vector of 
buying decisions that solve the optimization associated with 
the Buying algorithm on slot t, so that they minimize the 
expression: 



N 



N 



Y.[Qn.it)~0m + Vpmit)]An,it)+J2^^^('^rait)) (22) 
m— 1 m— 1 

subject to ©-(in. Suppose that /l„(t) > (we shall reach 
a contradiction). Because the term [Qnit) — On + Vpnit)] 
is strictly positive, and because the bniA) function is non- 
decreasing, we can strictly reduce the value of the expression 
([22] l by changing Anit) to 0. This change still satisfies the 
constraints (ISll-® and produces a strictly smaller sum in (l22l) . 
contradicting the assumption that Ait) is a minimizer. Thus, 
if Q„(i) > 6i„, then = 0. 

Because the queue value can increase by at most /i™*^^ on 
any slot, and cannot increase if it akeady exceeds On, it follows 
that Qnit) < On + fJ,^""^ for all t, provided that this inequality 
holds at t = 0. □ 



C. Analyzing Time Average Profit 

Theorem 1: Fix any value V > Q, and define On as follows: 

On=VpT" + (23) 
Suppose that initial stock queues satisfy: 

max ^ /f\\ ^ ^ r niax . o raax 
Mn <Qni^)<yPn + (24) 



If the Dynamic Trading Algorithm is implemented over t G 
{0,1,2,...}, then: 

(a) Stock queues Qnit) (for n e {1, . . . , A^}) are determin- 
istically bounded for all slots t as follows: 

l^T"" < Qnit) < Vp"^""" + SAijr"' for all n and all t (25) 



(b) If pit) is i.i.d. over slots with general distribution 
Pr[pit) =p]= Trip) for all peV, then for all t£ {1,2,.. .} 
we have: 



B E{L(Q(0))} 



(26) 



V Vt 

where the constant B is defined by ( fTTj l (and satisfies the 
inequality (fTSll), 0°^* is the optimal time average profit, and 
<j)it) is the time average expected profit over t slots: 



(27) 



Therefore: 

liminf > -B/V (28) 

t — >oo 

Theorem [T] shows that the time average expected profit 
is within B/V of the optimal value Because the B 

constant is independent of V, we can choose V to make 
B/V arbitrarily small. This comes with a tradeoff in the 
maximum size required for each stock queue that is linear 
in V . Specifically, the maximum stock level Q™"^ required 
for stock n is given as follows: 



Now suppose that we start with initial condition (5„(0) — 
^rnax f^j. ^jj ^ ^j^j ^jj ^ j^^^ f^j. ^ ^ {1,2,...} the crror 

term L(Q(0))/(Fi) is given by: 



i(Q(0)) _ EtliVPn 



..max\2 



= OiV)/t (29) 

Vt 2Vt ^ " ' 

This shows that if V is chosen to be large, then the amount 
of time t required to make this error term negligible must 
also be large. One can minimize this error term with an initial 
condition (5„(0) that is close to On for all n. However, this 
is an artificial savings, as it does not include the startup cost 
associated with purchasing that many initial units of stock. 
Therefore, the timescales are more accurately described by 
the transient given in ( |29l ). 

One may wonder how the Dynamic Trading Algorithm is 
achieving near optimal profit without knowing the distribution 
of the price vector pit), and without estimating this distribu- 
tion. The answer is that it uses the queue values themselves 
to guide decisions. These queue values Qnit) only deviate 
significantly from the target On when inefficient decisions are 
made. The values then act as a "sufficient statistic" on which 
to base future decisions. The same sufficient statistic holds for 
the non-i.i.d. case, as shown in Section HVl so that we do not 
need to estimate price patterns or time-correlations, provided 
that we allow for a sufficiently large control parameter V and 
corresponding large timescales for convergence. 

Finally, one may also wonder if the limiting time average 
expected profit given in ( |28] l also holds (with probability 1) 
for the limiting time average profit (without the expectation). 
When pit) evolves according to a finite state irreducible 
Markov chain (as is the case in this i.i.d. scenario), then 
the Dynamic Trading Algorithm in turn makes Qit) evolve 
according to a finite state Markov chain, and it can be shown 
that the limiting time average expected profit is the same (with 
probability 1) as the limiting time average profit. 
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D. Proof of Theorem Q] 

Proof: (Theorem [T| part (a)) By Lemma [5] we know that 
Qn{t) > 6l„ - Vp^'''= - ^iJI""^ for all t (provided that this 
holds at t = 0). However, e„ - Vp^"'' - = a^™""^. Thus, 
Qn{t) > Hn"''' for all t, provided that this holds for t = 0. 
Similarly, by Lemma |4] we know that Qn{t) < dn + M™"^ for 
all t (provided that this holds for t = 0), and 9^ + A^™""^ = 

QT""- □ 
Proof: (Theorem [U part (b)) Fix a slot te{0,l,2,...}. To 
prove part (b), we plug an alternative set of control choices 
A*{t) and fJ,*{t) into the drift-minus-reward bound (1211 1 of 
Lemma in Because p{t) is i.i.d., we can choose A*{t) and 
fJ.*{t) as the p-only policy that satisfies (flJl l, (fT4l i. Note that 
we must first ensure this p-only policy satisfies the constraint 
(|4]l needed to apply the bound (ISTT l. However, we know from 
part (a) of this theorem that Qn{t) > A*™"^ for all n, and so 
the constraint Q is trivially satisfied. Therefore, we can plug 
this policy A*{t) and into ( 1211 1 and use equalities (fTJI i 

and (O to yield: 

A{Q{t)) - VE{(l>{t) I Q{t)} <B- V<j)°P' 

Taking expectations of the above inequality over the distribu- 
tion of Q{t) and using the law of iterated expectations yields: 

E {L{Q{t + 1)) - L{Q{t))} - VE {<f){t)} <B- 1/0°^* 

The above holds for all t e {0, 1, 2, . . . , }. Summing the above 
over TG{0,...,i — 1} (for some positive integer t) yields: 



opt 



E {L{Q{t)) - L{Q{0))} - F ^ E {0(r)} < tB - tVc^' 



Dividing by tV, rearranging terms, and using non-negativity 
of !/(•) yields: 

> ^opt _ _ jg {i(Q(0))} /Vt 

where 4){t) is defined in dZTl l. This proves the result. □ 

IV. NON-I.I.D. Prices 

Here we consider a general class of non-i.i.d. price pro- 
cesses that have a mild decaying memory property. We first 
note that the only place a change is needed is in the proof 
of Theorem [T] part (b). Indeed, part (a) of Theorem [T] is a 
sample path statement that is true for any p{t) process. That 
is, regardless of whether or not p{t) is i.i.d. over slots, and 
even if it does not have well defined time averages as in ( fTOl i. 
we still have: 



for all n and all t 



provided that this inequality is upheld at time 0, and that the 
On values are defined as in ( l23T l. 

A. The Decaying Memory Property 

First consider any price vector process p{t) that satisfies 
([Tol l, where 7r(p) is the time average fraction of time that 
p{t) — p. Consider implementing the p-only policy that would 
achieve (fT3] l and ( fT4] i on each slot t if the process where 
i.i.d. with the same steady state distribution n{p). We call 



this the optimal p-only policy. Let A*{t) and represent 
the resulting decision variables under this policy. Because 
these decisions react only to the current p{t), and because 
the limiting fraction of time of being in each price state is the 
same as the i.i.d. case, the identities (fTsT l and (fT4] i are now 
true in the limit as i ^ cxo (rather than true on every slot t): 

= ;™ - ^ E (r ) - (r) } for all n 



T=0 
t-1 



- ..... 

t^OO t 

T=0 

where (/)*(t) is defined: 

N 



lim i^E{r(r)} 



^EK,(rK(r)-s„(M:(r))} 



N 



J2^{A*jT)pn{r) +bN{AUT))} (30) 



We now further assume that the p{t) process achieves time 
averages that are close to these limits when summed over an 
interval of T slots, regardless of the past history before the 
interval. Specifically, let H{t) denote the history of the system 
up to slot t, defined: 

H{t)MQit), Qit-1), . . . , Q(0);p(t- 2), . . . ,p(0)] 

Assume there are arbitrarily small values e > for which 
there exists a positive integer T (that may depend on e) such 
that the optimal p-only policy yields the following: For any 
slot to e {0, 1,2,.. .} and any H{to), we have for aU n G 
{l,...,iV}: 



^ to+T-l 



T 



E{A:{T)-^^UT)\H{to)} 



< e 



and 



,opt 



to+T-l 

E 

T = to 



E{riT)\H{to)} 



< € 



(31) 



(32) 



We say that the stochastic process p{t) has the decaying 
memory property if it satisfies (l3l1 l and ( |32] |. This property 
ensures that time averages over any interval of T slots are 
uniformly close to their steady state values, regardless of 
past history. The simplest model that satisfies this decaying 
memory property is the i.i.d. model, for which we can use 
T — 1 and e = 0. However, the decaying memory property 
is also satisfied by any p{t) process that evolves according 
to a finite state ergodic Markov chain, where the integer T is 
related to the "mixing time" of the chain. 

B. Performance 

Theorem 2: Suppose the Dynamic Trading Algorithm is 
implemented, with On values satisfying ( l23T l, and initial con- 
dition that satisfies (|24] |. Then the queue backlog satisfies 
the deterministic bound (IZST l. Further, for any pair T, e that 
satisfies (I3l1 i. (l32T i. we have for any integer M e {1, 2,3,.. .}: 

E{L(Q(0))} 



0(Mr) > - C2t - CiT/V - 



VMT 



(33) 



g 



and: 



liminf (/.(i) > (j)°P* - Cae - CiT/V (34) 

t — >oo 



where Ci and C2 are defined: 



N 



n=l 

C2 4 1 + 



3 ^ 1 

2 ^ T ^ 



If Q(0) = (^™"^, . . . then L(g(0))/(FAfT) has the 

form dig with t = MT. 

Proof: The theorem is proven by a Lyapunov drift argument 
over T-slot frames, and is given in Appendix B. □ 

Note that the same Dynamic Trading Algorithm as in the 
i.i.d. case is used here, without requiring knowledge of e or T. 
Indeed, the above performance bounds ( |33] | and (|34] | hold for 
any e, T pair that satisfies OTT l and (|32] |. The bounds can thus 
be optimized over all such e, T pairs. However, it suffices to 
note that such pairs can be found for arbitrarily small values 
of e. Thus, choosing a large value of V makes achieved profit 
arbitrarily close to the optimal value 0°^*. However, if the p{t) 
process has a long "mixing time," then the value of T needed 
for a given e will be large, and so the V parameter will also 
need to be chosen to be large. Thus, non-i.i.d. p{t) processes 
typically require larger queue sizes to ensure close proximity 
to the optimal profit. 

V. Arbitrary Price Processes 

Here we consider the performance of the Dynamic Trading 
Algorithm for an arbitrary price vector process p{t), possibly 
a non-ergodic process without a well defined time average 
such as that given in (fTOl l. In this case, there may not be a well 
defined "optimal" time average profit 1/)°^'*. However, one can 
define as the maximum possible time average profit 

achievable over the interval {0, . . . , i — 1} by an algorithm 
with perfect knowledge of the future and that conforms to 
the constraints (|2]i-(|6|l. For the ergodic settings described in 
the previous sections, (j)°P*{t) has a well defined limiting 
value, and our algorithm comes close to its limiting value. 
In this (possibly non-ergodic) setting, we do not claim that 
our algorithm comes close to (f)°P*{t). Rather, we make a less 
ambitious claim that our policy yields a profit that is close to 
(or greater than) the profit achievable by a frame-based policy 
that can look only T slots into the future. 

A. The T-Slot Lookahead Performance 

Let T be a positive integer, and fix any slot £ 
{0, 1,2,.. .}. Define ijjT{ta) as the optimal profit achievable 
over the interval {to, . . . , to + T — 1} by a policy that has 
perfect a-priori knowledge of the prices p{t) over this interval, 
and that ensures for each n G {1, . . . , N} that the total amount 
of stock n purchased over this interval is greater than or equal 
to the total amount sold. Specifically, iprito) is mathematically 
defined according to the following optimization problem that 



has decision variables A{t), /x(t), and that treats the stock 
prices p(r) as deterministically known quantities: 

Max: V^Et";!"' ELi[Mn(rK(r) - .s„(^„(r))] 



-E;=/o"'EtiK(rK(r) +&„(A„(r))] (35) 
Constraints Q, Q, ©, © 



Subj. to: j:r=ur' Mr) > Y.':=l'' Mr) yn (36) 



(37) 



The value V-'t(^o) is equal to the maximizing value in the 
above problem (l35ll-(l37]i. Note that the constraint ( l36l l only 
requires the amount of type-n stock purchased to be greater 
than or equal to the amount sold by the end of the T-slot 
interval, and does not require this at intermediate steps of the 
interval. This allows the T-slot Lookahead policy to sell short 
stock that is not yet owned, provided that the requisite amount 
is purchased by the end of the interval. 

Note that the trivial decisions A{t) = ^jl{t) = for r G 
{to, . . . ,tQ+T—l} lead to profit over the interval, and hence 
^T(io) > for all T and all to- Consider now the interval 
{0, 1, ... , MT- 1} that is divided into a total of M frames of 
T-slots. We show that for any positive integer M, our Dynamic 
Trading Algorithm yields an average profit over this interval 
that is close to the average profit of a T-slot lookahead policy 
that is implemented on each T-slot frame of this interval. 

B. The T-Slot Sample Path Drift 

Let L{Q{t)) be the Lyapunov function of (flST l. For a given 
slot t and a given positive integer T, define the T-slot sample 
path drift Arit) as follows: 



AT{t)AL{Q{t + T))~L{Q{t)) 



(38) 



This differs from the one-slot conditional Lyapunov drift in 
(fTSI l in two respects: 

• It considers the difference in the Lyapunov function over 
T slots, rather than a single slot. 

« It is a random variable equal to the difference between 
the Lyapunov function on slots t and t + T, rather than 
a conditional expectation of this difference. 

Lemma 5: Suppose the Dynamic Trading Algorithm is im- 
plemented, with 6n values satisfying (l23T l. and initial condition 
that satisfies ( |24] |. Then for any given slot to and all integers 
T > 0, we have: 



+ IQnito) - On\ Y!:=l'\Mr) - AI{t)] 

where (j){r) is defined in (O, and (/'*(t), fJ-*{T), A*{t) 
represent any alternative control actions for slot r that satisfy 
the constraints dU, (O, (|5]l, I©. Further, the constant D is 
given by: 



D-1 



n=l 



(39) 



3 1 1 

2 ^ 2r2 ^ T 

Proof: This lemma is identical to Lemma [U in Appendix 
B, and the proof is given there. □ 
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Theorem 3: Suppose the Dynamic Trading Algorithm is 
implemented, with 6n values satisfying ( |23] l, and initial con- 
dition that satisfies ( l24b . Then for any arbitrary price process 
p{t) that satisfies ([T]i, we have: 

(a) All queues Qn{t) are bounded according to dZSl) . 

(b) For any positive integers M and T, the time average 
profit over the interval {0, . . . , MT — 1} satisfies the deter- 
ministic bound: 

MT-l M-1 



MT 

m=0 

DT _ L(Q(0)) 



(40) 



where the ipT{mT) values are defined according to the T- 
slot lookahead policy that uses knowledge of the future to 
solve (|35]|-(|37|| for each T-slot frame. The constant D is 
defined in and if Q(0) = (Aif""^, . . . , /iw""") then 

LiQiO))/iMTV) has the form ^ with t = MT. 

Proof: Part (a) has already been proven in Theorem [T] To 
prove part (b), fix any slot to and any positive integer T. Define 
A*{t) and IJ'*{t) as the solution of (l35] )-(l37l) over the interval 
T e {to, . . . , to + r — 1}. By ( |37] |. these decision variables 
satisfy constraints (O, (|5]l, (|6]), and hence can be plugged 
in to the bound in Lemma |5] Because ( |35] ). ( l36b hold for these 
variables, by Lemma |5] we have: 



(r) < DT^ - V^T{to) 



to+T-l 

ATito)-V J2 < 

T = to 



Using the definition of At (to) given in (l38i l yields: 

io+T-l 

i(Q(io+T))-L(Q(to))-F ^ 0(r) <OT2-V^VT(io) 

T = t0 

The above inequality holds for all slots to G {0,1,2,...}. 
Letting to — mT and summing over me{0,l,...,Af — 1} 
(for some positive integer M) yields: 

J\/T-l 



L{Q{MT)) - L{Qm -V 4>{t) < 

T=Q 

M-1 

MDT^ -V iPrimT) 



m=0 

Rearranging terms and using non-negativity of i( ) proves the 
theorem. □ 
Theorem |3] is stated for general price processes, but has 
explicit performance bounds for queue size in terms of the 
chosen V parameter, and for profit in terms of V and of the 
profit 4'T{niT) of T-slot lookahead policies. Plugging a large 
value of T into the bound ( l40l i increases the first term on 
the right hand side because it allows for a larger amount of 
lookahead. However, this comes with the cost of increasing 
the term DT/V that is required to be small to ensure close 
proximity to the desired profit. One can use this theorem 
with any desired model of stock prices to compute statistics 
associated with 7/17- (mT) and hence understand more precisely 
the timescales over which near-optimal profit is achieved. 



VI. PLACE-HOLDER STOCK 

Theorems [T] |2] [3] require an initial stock level of at least 
/i™"^ in all of the N stocks. This can be achieved by initially 
purchasing these shares (say, at time t = —1). This creates 
an initial startup cost that, while independent of V, can still 
be substantial. It turns out that we can achieve the same 
performance as specified in Theorems [T] |2] |3] without paying 
this startup cost. This can be done using the concept of place- 
holder backlog from [20], which becomes place-holder stock 
in our context. 

Specifically, suppose that we use Q{t) to represent the 
actual amount of stock held on slot t, and assume that Q{0) 
satisfies: 

< Q„(0) < V^K"" + ^l^T" for all n e {1, . . . , N} 

Define Q{t)tQ(t) + ^"^'^^ as an augmented stock vector, 
where vector is given by: 



7nax \ 



Notice that the initial value of Q(0) satisfies (1241 1. Let us im- 
plement the Dynamic Trading Algorithm using the augmented 
stock vector Q{t). This is equivalent to starting out the system 
with an initial amount that includes /i™"^ fake shares of stock 
in all queues. We then run the algorithm on the Q{t) values, 
and any time we are asked to sell stock, we choose to sell 
real shares whenever possible. The algorithm breaks if at any 
time we are asked to sell at a level that is more than the 
number of real shares we have. However, because on every 
sample path, we have Quit) > we know that we are 

never asked to sell more real shares than we actually have. 
Thus, these fake shares simply act as place holders to achieve 
the performance that would be achieved if we started out 
with units of real shares in all queues. Specifically, we 

achieve performance guarantees specified in Theorems [T] |2l |3] 
associated with Q(0). If all actual queues are initially empty, 
then we have (5(0) = ^i"^""^, and hence we also have transients 
corresponding to T(Q(0)) = L{fj/^°'^), without having to pay 
the startup cost of purchasing /i™"^^ shares of each stock. 

VII. Extensions 

A. Price Jumps and Stock Splits 

We have assumed that prices are bounded by values p™"^ 
for simplicity of exposition. In practice, the p™"^ values can 
be chosen as price levels that we do not expect to see (perhaps 
3 or 4 times the current price). The prediction should be small 
enough to maintain reasonably small values for 6'„ and Q™"^, 
given in (|23] | and (l25T l. 

In the (desirable) situation when the price of a certain stock 
n exceeds our estimated upper bound we can simply 

adjust p™"" to a higher value. We must then also appropriately 
adjust dn according to ( |23] |. This can be viewed as if we are 
starting the system off with a new initial condition at this time 
(given by the current queue state), with new parameter choices. 
Because Theorems [T] |2] |3] are stated in terms of general initial 
conditions, the achieved performance is then also determined 
by these theorems (applied to the time interval starting at the 
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current time). Intuitively, this will not "break" the algorithm 
because it continuously adapts to emerging conditions. 

Similarly, we might have a price go so high as to affect a 
stock split. This (desirable) situation can either be modeled by 
an increase in the p™"^ value (maintaining the same number 
of shares, but treating each share as being worth double the 
market price), or by doubling the number of shares of that 
stock and increasing the and/or the V parameter to allow 
for more shares to be maintained. Again, the new situation 
can be viewed as creating a new initial condition, and so the 
algorithm can adapt to such events. 

B. Scaling for Exponential Growth 

Suppose we run the Dynamic Trading Algorithm over a 
fixed window of W slots, using parameters and V, with 

gmax (defined by (|23] |. Assume we use place-holder stock so 
that the actual stock queues are at the beginning of the time 
window. If the achieved profit over this window is z, then 
for any given value a > 0, a profit (1 + a)z could have 
been achieved if we had scaled the /j,™"^ and V parameters 
(and hence 6*™"^ by ( |23] )) by a factor {1 + a) (for simplicity, 
we ignore integer constraints in the scaling of for the 

high level discussion of this subsection). Of course, doing this 
would require a tolerance to the extra amount of risk associated 
with keeping that much more stock in the stock queues. 
However, assuming our risk tolerance grows proportionally to 
our wealth, this increased risk is tolerable on the next window 
of W slots. Specifically, choose a value T, and consider the 
T-slot lookahead policy for comparison using (|40] | of Theorem 
|3] Fix a value e > 0, and choose /i™""^, V, and M so that 
DT/V + L(/x™"^)/(A/rV^) < e. Let W = MT. Then by 
(|40] l we know that time average profit over W slots is within 
e of that provided by the T-slot lookahead policy. 

Now consider consecutive windows of W slots, and define 
as the time average profit that would be earned over the 
wth window if we use place-holder stock with initial stock 
levels, and if we use parameters /i™"^, V, and 6"^°"^. Let 

(T) 

qin denote the time average profit of the T-slot lookahead 
policy over this same window of time. By Theorem[3]we have 
that Qw > qvf ^ — e for each window w £ {1,2,...}. Define 
auj4/3max[qtu, 0], where (3 is some positive proportionality 
constant. Then is non-negative, and if it is positive then it 
is proportional to the profit earned over window w. On each 
window w > 1, rather than using parameters /i™"^, V, and 
gmax^ we scale these by the following factor: 

(l + ai)(l + a2)---(l + a»-i) 

Ignoring integer constraints in this scaling for simplicity, we 
know that time average profit earned over window w is at 
least: 

{q^J^ - e)(l + ai)(l + aa) • • • (1 + a^-i) 

It follows that our wealth increases exponentially as (1 + 
ai)(l + a2)(l + as) . . ., where the profit coefficients are 
close to those associated with the T-slot lookahead policy. In 
particular, the coefficients are all greater than a uniform 
positive number whenever qiu > 2e for all w e {1, 2, . . .}. 



C. Relaxing the Buying Constraint (O 

The constraint (|6]l can make the buying policy of the 
Dynamic Trading Algorithm difficult to implement when the 
number of stocks N is large, as discussed after the descrip- 
tion of the algorithm in Section IIII-AI Here we consider a 
simple and greedy modification that relaxes the constraint (|6]i: 
Assume the buying functions bn{A) are concave and non- 
decreasing. The algorithm seeks to minimize the expression: 

N 

J2 - ^" + Vpn{t))A„{t) + Vbn{A„{t))] (41) 

ra=l 

subject to A„ {t) G {0, 1, . . . , m""""} for all n e {1, . . . , N}, 
and subject to J2n=i ^'n-{t)pn{t) < x. Consider the follow- 
ing sequential algorithm for adding new shares until this 
last constraint is either met or exceeded: Initialize A = 
(Ai, . . . , Aat) — 0. On step k of the procedure, for each 
n G {1,...,A^} such that An < M™"^' compute the value 
of: 

{Qn{t) ~ 6n + VVnit)) + V {h^jA^ + 1) - 6„(A„)) 

If this value is non-negative for all n G {1, . . . , A^}, stop and 
designate A(t) = A. Else, choose the n with the smallest 
(negative) such value and add one more share to the A vector 
in that entry n. If the constraint '^^^i An{t)pn{t) < x is 
either met or exceeded, we are done and choose A{t) = A. 
Else, repeat the procedure with the new A vector 

The intuition behind this greedy relaxation is that we 
choose to increment our allocation by one share in the stock 
with the smallest (negative) ratio given by the incremental 
change in dTTT i divided by the amount consumed in the total 
money budget x. This procedure yields a vector A{t) that 
satisfies the constraints An{t) G {0, 1, . . . , /i™"^} for all n, 
although it may violate the constraint {i)Pn {t) < a; by 

overshooting the required value x with purchase of one extra 
share of a particular stock. However, it has the property: 

N 

y2An{t)pn{t)<x+ max 



Therefore, we spend no more than a constant amount over our 
intended constraint x on each slot. It can be shown that this 
greedy policy yields a value of the expression i4l\i that is less 
than or equal to the corresponding expression that minimizes 
this value subject to the original constraints (ISll-©. This is 
the key property used in Lemma |2] to prove Theorems [T] |2] 
[3] Hence, it can be shown that these theorems still hold under 
this relaxation. Specifically, our queue sizes are still bounded 
according to dZSl ) (which was derived using only the 
constraints and not constraint Q), and our time average profit 
(under this relaxed policy that does not necessarily satisfy 
(|6]l) is close to or better than the corresponding policies used 
for comparison in Theorems [H |2] [3] which do satisfy the 
constraint (|6]l. 

VIII. Conclusion 

This work uses Lyapunov optimization theory, developed 
for stochastic optimization of queueing networks, to construct 
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a dynamic policy for buying and selling stock. When prices 
are ergodic, a single non-anticipating policy was constructed 
and shown to perform close to an ideal policy with perfect 
knowledge of the future, with a tradeoff in the required amount 
of stock kept in each queue and in the timescales associated 
with convergence. For arbitrary price sample paths, the same 
algorithm was shown to achieve a time average profit close 
to that of a frame based T-slot lookahead policy that can 
look T slots into the future. Our framework constrains the 
maximum number of stock shares that can be bought and 
sold at any time. While this restricts the long term growth 
curve to a linear growth, it also limits risk by ensuring no 
more than a constant value Q™"^ shares of each stock n are 
kept at any time. A modified policy was briefly discussed that 
achieves exponential growth by scaling QJ""^ in proportion 
to increased risk tolerance as wealth increases. These results 
add to the theory of universal stock trading, and are important 
for understanding optimal decision making in the presence of 
a complex and possibly unknown price process. 

Appendix A — Proof of Lemma[T] 

Here we prove Lemma [T] From the dynamics for Qn{t) in 
dTjl we have: 

{Qn{t + 1) - = (max[Q„(i) - ^irXt) + A„(<), 0] - Onf 

< {Qn{t) - fin{t) + An{t) - e,,)^ (42) 

The inequality above holds because 6'„ > 0. To see this, note 
that the inequality holds with equality if Qn{t) — Hn{t) + 
An{t) > 0. In the opposite case, the result of the max[-,0] 
operation is 0, and we have: 

(0 - 9nf <{z~ O^f 
where z is any negative number, and so: 

(0 - e„f < {Qn{t) - flnit) + A„(t) - 0„)' 

From (l42T i we have: 

2 - 2 2 

Summing over n G {1, . . . , N} and taking conditional expec- 
tations proves that: 

1 ^ 

A(Q(i)) < -^E{(M„(t)-A„(t))2|Q(t)} 

n=l 

N 

-Y.{Qn{t)-e)¥.{lln{t)-A.,,{t)\Q{t)} 
n=l 

Using the definition of B in dTTl l to replace the first term on 
the right hand side above yields the result. 

Appendix B — Proof of Theorem[2] 

A. T-Slot Drift Analysis 

For the same Lyapunov function given in dTsl l. and for 
a given positive integer T, define the T-slot conditional 
Lyapunov drift as follows: 

AT(i/(t))4E {L{Q{t + T)) - L{Q{t)) \ H{t)} (43) 



where H{t) is the past history up to time t, defined as 
[Q{t),Qit - 1), ... , Q(0);p(t - - 2), . . . ,p(0)]. Also 

define the T-slot sample path drift h.T{t) as: 

KT{t)^L{Q{t + T)) ~ L{Q{t)) 

With this definition, h.T{t) is a random variable representing 
the difference between the Lyapunov function at time t + T 
and time t, and: 

E{AT{t)\H{t)} ^AriHit)) (44) 

Lemma 6: Suppose the Dynamic Trading Algorithm is im- 
plemented, with 6n values satisfying (l23T l. and initial condition 
that satisfies (l24l i. Then for all to e {0, 1,2,.. .}, all integers 
T > 0, and all possible values of Q(to) we have: 

N t„+T-l 

71—1 T—tQ 

where B is defined: 

n=l 

Proof: First note that: 

{Qn{to + T)-e„f < {fir-? 

+ {Quito) - Ett;r"'K(^) - Anir)] - (45) 

This can be seen as follows: If Qn{tQ + T) > 6n, then by (|25] | 
and (|23]l we know that |Q„(to + T) - e„\ < m™""'' and so 
the square of this quantity is bounded by the first term on the 
right hand side of ( |45] |, so that (l45T l holds in this case. Else, 
suppose that Qn{to +T) < 0„. We then have: 

to+T-l 

dn > Quito +T)>QM- J2 [/^«(^) - ^"(^)] 

where the second inequality holds because the right hand side 
neglects the max[-, 0] in the queueing dynamics (|7]i- It follows 
that i45[ again holds. 
From (|45] | we have: 

i [{Quito +T)- e.„f - {Qn{to) - On?] < {l^mV^ 

+ i(Et'tr'[Mn(r)-A„(r)])' 

-{Qu{to) - Or.) Er=r'[M«(r) - A„(r)] 

Note that |^„(r) - A„{t)\ < ^1^"=' for all r. Summing the 
above over n e {1, . . . , N} yield the result. □ 
Lemma 7: Suppose the Dynamic Trading Algorithm is im- 
plemented, with Qn values satisfying (l23T l. and initial condition 
that satisfies ( |24] |. Then for any times r and to such that 
r > to, and for any given Q{t), Q{to), we have: 

N 

-Vc^{t) - ^(Q„(to) - On){fin{T) - A„(t)) < 
u=l 

N 

2\r-to\j2{firn' 

n=l 

N 

-Vr{r) - ^(g„(to) - (r) - A:{t)) 
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where 0(r) is defined in ([8]l, and 0*(t), (j,*{t), A*{t) 
represent any alternative control actions for slot r that satisfy 
the constraints Q, 

Proof: Because each queue can change by at most 
per slot, we have for each n e {1, . . . , N}: 

~ Qn{to){pn(T) - An{T)) < -Q„ (t) (;U„ (t) - A„(t)) 

+ \r-to\{firi' (46) 

Therefore: 

-VHr) - Eli(Qn(io) - 0„)(Mn(T) - A„(r)) 

<k-tolEli(Mr^)'~v^0M 

-Eli(Q"(r)-0„)(Mn(T)-A„(r)) 



<\r~to\E:=At^rn'-vrir) 



ELi(Q"M-M(M;(r)-A;(r)) 



(47) 



<2|r-to|E!Li(A^r")'-^r(r) 



-En=iiQn(to) - 6MAr) - AI{t)) (48) 

where (|47] | holds because, from Lemma |2] we know the 
Dynamic Trading Algorithm on slot t minimizes the left 
hand side of the inequality over all alternative decisions for 
slot T that satisfy the constraints (|2|i, Q, ^ (note that 
we already know Q„(t) > and so constraint Q is 

redundant). Inequality (l48T l follows by an argument similar to 
(|46]l. □ 
Lemma 8: Suppose the Dynamic Trading Algorithm is im- 
plemented, with On values satisfying (|23] |. and initial condition 
that satisfies ( |24] |. Then for any given slot to, all integers 
T > 0, and all possible values of <5(to) we have: 

+ En=i IQnito) - 9n\ Et°ir'k* W - Kir)] 

where 0(t) is defined in (O, and 0*(t), //*(t), A*(t) 
represent any alternative control actions for slot r that satisfy 
the constraints (|2]i, (O, (|5]l, (|6|. Further, the constant D is 
defined: 

i?4B + (i + i/r)^(/^r")' 

n—l I I 

Proof: Summing the result of Lemma |7] over r e 
{to, ■ ■ ■ ,to + T — 1} and using Lemma |6] yields: 

+J:n=li^^^nHT - i)t - ^E'^r' r w 

- Eti(Qn(io) - ^^„) Et=*r'k* (r) - A* (r)] (49) 

Now note that -(g„(t)-6'„) = |Q„(to)-^'«| if Qn{to) < On- 
Else, if (3„(to) > then (3„(to) - On = \Qn{to) - 6n\ < 
^C"" (by dlSll and Thus: 

to+T-l 



AT 



n=l 

-2 E IQ»(^o)-^ 



r=io 
to+T-l 

^1 E 

to+T-1 



T=to 



k*.(r)-A;r)] 

KM-A;(r)] 



where M{t) is the set of all n e {l,...,iV} such that 
Qn{t) > On. The final term is bounded by 2rE!Li(M"°'')^- 
Thus: 

N to+T-1 
~Y.(QM-On) E Kir)-A*n{T)] 
n—l T—tf) 
N to^T-l N 

<Y.\QM~On\ E K(r)-A:r)]+2rE(Mr")' 

n—l T — to n—l 

Using this in ( |49] l yields the result. □ 

B. The Time Average Profit 

If the system satisfies the requirements specified in Lemma 
[8] then we can take conditional expectations of Ay (to) to 
yield (from (l44b): 

AT(i?(to)) - VY!°=ur'^{m I H{t,)] < DT' 



'^Et°=r'lE{r(r)|i/(to)} 



-Ell IQ»(^o) - en|Et'l+r'lE{<(r) - A;(r) I i/(to)} 

Plugging the policy A*{t), ^i*{t) (and hence (/)*(t)) that yields 
(EB, <|32]l gives: 

AT(i?(to)) - FEt°=r'E{^(r) I Hit,)] < DT^ 



^VT<j)°P'^ + VTe 



+ En=i[Vpr'' + t^n^Te (50) 
where we have used the fact that (by (IZST i and (l23Tl): 



Taking expectations of ( l50b with respect to i?(to) yields: 

to+T-1 

E{LiQ{to + T))-L{Q{to))}-V E IE{<^(t)} < 



r=to 

CiT^ + VTC2e - VT(j)°P^ 



where Ci and C2 are defined: 



N 



^1 ' ^ + rE^" 



n=l 



AT 



Pn 



The above holds for all to. Summing over to G 
{0, T,2T,..., (M - 1)T} for some positive integer M and 
dividing by VMT yields: 

MT-l 



;{L(Q(A/r))-i(Q(o))} 1 



E IE{0(r)}< 



VMT MT 

T = Q 

CiT/V + Cae - <t)°P' 
Rearranging terms and using non-negativity of L( ) yields: 

4>{MT) >(t>P - (726 - CiT/V VMT 

Therefore (noting that the lim inf sampled every T slots is the 
same as the regular lim inf because (f){T) is bounded) yields: 

liminf 0(t) > <?!)°P* - Cae - CiT /V 
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Appendix C — Characterization of < 

Lemma 9: The value 0°^* is achievable by a single p-only 
policy that satisfies c?* = for all n G {1, . . . , N}. 

Proof: For each price vector p in the finite set V, define 
ri(p) as the set of all decision vectors [A; /j] that satisfy 
Q, ([3]l, (|5]l, (|6]l, where p(t) is replaced with p in ([3]) and 
(|6]l. Note that ri(p) is finite for each p G "P. A p-only 
policy is characterized by a conditional probability distribution 
q{A, fi\p) that satisfies: 

^ g(A, 1 for allpeP (51) 

[A;M]ea(p) 

< q{A, fj,\p) < 1 for all A, fi, p (52) 

q{A, fi\p) = whenever [A; fi] ^ ^{p) (53) 

where (7(A, is defined: 

qiA, fj,\p)APr[Ait) = A, flit) = fi I p(t) = p] 

The collection of values q{A, fi\p) for p ^ P and [A; /x] G 
n(p) can be viewed as a finite dimensional vector defined over 
the compact set defined by (|5TI)-(|53]). Hence, by the Bolzano- 
Wierstrass theorem, any infinite sequence of such policies must 
have a convergent subsequence that converges to a particular 
p-only policy that satisfies jSB-dSSl). In particular, let A'^''^ (t), 
n''^'^ (t) be an infinite sequence of p-only policies defined by 
distributions q^'''> [A, fi\p) that satisfy (|5T])-(l53l), and define: 

dWAV7r(p) Yl q^''HA,^,\p)[A,,-^^n] 



pev 



[A;m]GO(p) 

E 

A;(j,]eO(p) 



JV 

q^''\A,n\p)Y[^nPn - S„(^„)] 
n=l 



N 



^n{p) ^ q'-^>{A,,l\p)}^[AnPn+bn{An)] 

pev [A;/i]eo(p) n=i 

It is clear that dil''^ and cj)^''') correspond to the virtual drift of 
stock n and the virtual profit under the p-only policy A^''^ (t), 
H^''\t), as defined by the time average expectations in ( fTTI ). 
iT2\ . Assume that this infinite sequence of p-only policies 
satisfies: 

di!"^ > for all n G {1, . . . , N}, A: G {0, 1, . . .} (54) 
limfe^oo 0^'=' = (55) 
Consider now any convergent subsequence of distributions 



q 



(A, fi\p) that converge to some particular distribution 
q*{A, fi\p) that satisfies (l5TI)-(l53b. This defines a single p-only 
policy. Further, by (|54|)-(|55]), this p-only policy must satisfy: 

> for all n G {1, . . . , TV} , 0* = 0°^* 

It remains only to show that the algorithm can be modified 
to achieve with d* = for all n G {1, . . . , N}. Suppose 
the current p-only policy has a stock n G {1, . . . , iV} such that 
d* > 0. We shall create a new p-only poUcy with d* = 0, 
without reducing profit. Define: 



pev 



E 

[A:^i]enip) 

E 

pev [A;/x]eo(p) 



q*iA,fl\p)A„ 



q*{A,fj,\p)nn 



Then d* = a* - /?*, and so a* > /3* > 0. Consider now 
a new p-only policy A(t), fi{t) defined as follows: Define 
fi{t):^fi* (t) (so that selling decisions are the same). Define 
Am(i)=A*j(t) for all m ^ n. For stock n, choose An(t) as 
follows: 



Anm 



A*(t) with probability /3* /a* 
otherwise 



Note that this new p-only policy satisfies the constraints (|2]), 
©, (01, (|6]l, as the original policy satisfies these constraints, 
and we have only changed the A*(t) decision vector by 
probabilistically setting the rtth entry to zero. Also note that 
the drift for all stocks m ^ n is unchanged, so that d,„ > 
for all 771 7^ n. Further: 

d„ = a; (/?;/<) - ^* = 

Thus, we have d„i > for all m G {1, . . . , N}. Finally, it is 
easy to see that this modification has not reduced the profit 
value, and hence it must also achieve <j) = 0°^**. If there are any 
remaining stocks m such that d*j > 0, we can repeat the same 
modification procedure. This proves the existence of a p-only 
poHcy that achieves with d* = for all n G {1, . . . , A^}. 

□ 

Lemma 10: If the price process p{t) satisfies dTol l. then 
is an upper bound on the lim sup time average profit of any 
policy that satisfies dU-©. In particular, if A{t) and fi{t) 
are decisions for any policy that satisfies ©-(ISll for all t G 
{0, 1,2, . . .}, then: 



1 

lim sup — ( 



{r)< 



lopt 



with probability 1 (56) 



T=0 



and: 



1 



t-i 

E 



E{0(T)}<</>°f* 



(57) 



lim sup - 

t^oo t 

Proof: We prove only ( |56] | (the result ( |57] | follows from 
for example, using the Lebesgue Dominated Conver- 
gence Theorem with the observation that < ^(r) < 
-P""'^/^™"'^)- Because the algorithm can never sell more 
stock than it has, for a given time t we have: 



t-i 



> E/^»(^) for all nG {!,..., iV} (58) 



r=0 



T=0 



Now for each p ^ P, define Tp{t) as the set of slots r G 
{0, 1, ... ,t — 1} for which p{t) = p, and define |Tp(i)| as 
the total number of such slots. Define P{t) as the set of all 
price vectors p £ P for which |rp(i)| > 0. We thus have: 



tE^w- E 



\Tp{t)\ 1 



E 



t \TJt)\ 
pev(t) I pwi ^gy^(t) 



However, for each p G P{t) we have: 

N{A,n,p,t)(i){A,fi,p) 



\Tp{t)\ 



;A;/x]eO(p) 
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where N{A, fJ,,p, t) is defined as the number of times during 
the interval {0, . . . , t—1} that the algorithm selects A{t) — A, 
tJ-ir) = fi when p{t) — p, and where <j>{A, fi, p) is given by: 

N N 
(j>iA, H,p)=y^J^J.nPn - S„(/Z„)] - y^JAnPn 
n— 1 n— 1 

The values N{A, fi, p, t) define a p-only policy, given by 
distribution: 



q^'^A, fi\p) ^ \ |Tp 

otherwise 



^ if|Tp(i)l>0 



Further, this distribution satisfies the constraints (l5Tll-(l53]l 
required for p-only policies. Now let tk be an infinite subse- 
quence over which the lim sup time average profit is achieved, 
so that: 

^ t— 1 ^ *fe 

lim sup - V 0(t) = lim — V 0(t) 

J. . T ' * h — if-v-i Ti. ' * 



t — *oo t ^ k — >iyo th 



We thus have: 

'(T) 



tk 



T = 



E 



\Tpitk)\ 



q^''\A,ti\p)^{A,fi,p)i59) 



p&v{tk) '"■ lA:,fj.]en{p) 

Further, with this notation, from ( |58l ) we have for each n G 
{1,...,7V}: 

0< ^K(T)-/in(r)] 



r=0 



E 



\Tpitk)\ 
tk 



J2 q'•'''HA^^\p)[An-^in]m) 



Because V is finite and is finite for each p E V, the p- 
only distributions q^-*''\A, fi\p) can be viewed as an infinite 
sequence of vectors in a compact set defined by (|5Tl)-(|53]l, 
and hence have a convergent subsequence that converges to a 
distribution q*{A,p,\p) that is in the set dSTt-dSSll. Note by 
( [Tol l that for each p E P we have: 

\T (t)\ 

lim — = 7r(p) with probability 1 

t— ►oo t 

Taking limits of ( |59] l and (|60] | thus yields: 

t-i 



limsupi V0(t) = 



pGT' [A;/x]eO(p) 

and for all n e {1, . . . , A^}: 

< J2 <p) E - Mn]^d: 

This defines a p-only policy that achieves the lim sup time 
average of (pit), while yielding d* > for all n. It follows 
that the lim sup time average of must be less than or equal 
to the value (ff^^ defined as the largest such value achievable 
over p-only policies that satisfy d* > for all n. □ 
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